Robust experimentation in the continuous time bandit problem

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous Time Associative Bandit Problems

In this paper we consider an extension of the multiarmed bandit problem. In this generalized setting, the decision maker receives some side information, performs an action chosen from a finite set and then receives a reward. Unlike in the standard bandit settings, performing an action takes a random period of time. The environment is assumed to be stationary, stochastic and memoryless. The goal...

متن کامل

On the Optimal Reward Function of the Continuous Time Multiarmed Bandit Problem

The optimal reward function associated with the so-called "multiarmed bandit problem" for general Markov-Feller processes is considered. It is shown that this optimal reward function has a simple expression (product form) in terms of individual stopping problems, without any smoothness properties of the optimal reward function neither for the global problem nor for the individual stopping probl...

متن کامل

Robust Control of the Multi-armed Bandit Problem

We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm there exists a robust counterpart of the Gittens index that is the solution to a robust optimal stopping-time problem. We then characterize the optimal policy of the robust MAB as a project-by-projec...

متن کامل

Robust Contracts in Continuous Time∗

We study two types of robust contracting problem under hidden action in continuous time. In type I problem, the principal is ambiguous about the project cash flows, while he is ambiguous about the agent’s beliefs in type II problem. The principal designs a robust contract that maximizes his utility under the worst-case scenario subject to the agent’s incentive and participation constraints. We ...

متن کامل

Finite-Time Regret Bounds for the Multiarmed Bandit Problem

We show finite-time regret bounds for the multiarmed bandit problem under the assumption that all rewards come from a bounded and fixed range. Our regret bounds after any number T of pulls are of the form a+b logT+c log2 T , where a, b, and c are positive constants not depending on T . These bounds are shown to hold for variants of the popular "-greedy and Boltzmann allocation rules, and for a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Economic Theory

سال: 2020

ISSN: 0938-2259,1432-0479

DOI: 10.1007/s00199-020-01328-3